Systems and methods for generating multi-level hypervideo summaries

ABSTRACT

A hypervideo summary comprised of multiple levels of related content and appropriate navigational links can be automatically generated from a media file such as a linear video. A number of algorithms and selection criteria can be used to modify how such a summary is generated. Viewers of an automatically-generated hypervideo summary can interactively select the amount of detail displayed for each portion of the summary. This selection can be done by following explicit navigational links, or by changing between media channels that are mapped to the various levels of related content. This description is not intended to be a complete description of, or limit the scope of, the invention. Other features, aspects, and objects of the invention can be obtained from a review of the specification, the figures, and the claims.

CROSS-REFERENCED CASES

The following applications are cross-referenced and incorporated hereinby reference:

U.S. patent application Ser. No. 10/116,026 entitled “A System forAuthoring and Viewing Detail on Demand Video,” by Andreas Girgensohn etal., filed Apr. 3, 2002, Attorney docket no.: FXPL-01036US0.

U.S. patent application Ser. No. 10/116,012 entitled “ReducedRepresentations of Video Sequences,” by Andreas Girgensohn et al., filedApr. 3, 2002, Attorney docket no.: FXPL-01038US0.

FIELD OF THE INVENTION

The present invention relates to generating multi-level summaries forvideo files and segments.

BACKGROUND

Several approaches to interactive video have been developed to allow auser to interface with digital video systems. One such approach providesoptional side trips, which allow users to follow a link out of thecurrently playing video in order to watch an alternate video sequence.At the end of the alternate sequence, or upon user input, the videopresentation returns to the original video departure point and continuesto play from that point. For example, some DVDs include options forviewers to follow links out of the currently playing video to watchother video clips. When a link is active, an icon appears on top of theplaying video. The viewer can press a button on a remote control to jumpto the alternative video. For example, certain DVD movies provide linksthat take a viewer to video segments explaining how a particular scenein the movie was filmed. Afterwards, the original video continues fromwhere the viewer left.

Expanding on the concept of optional side trips in video,detail-on-demand video includes one or more base video sequences eachhaving one or more alternate video sequences. Each alternate videosequence provides additional details related to the base video sequence.During video playback, users can select the alternate video sequence toview this additional detail.

Upon user input or completion of the alternate video sequence, thepresentation returns to the base video sequence. The author maydetermine the location where the presentation resumes. Additionally,alternate video sequences can include links to other video sequences,thereby creating a hierarchical structure in which video sequencesproviding additional detail may in turn contain links for sequenceshaving even more detail.

The nature of detail-on-demand video is well suited for applicationssuch as creating training or “how-to” videos. In such an application,viewers can control the level of explanation they receive by followinglinks to the appropriate level. Base video sequences can present anoverview of the information at an abstract or relatively “high” level.Users can follow a link from a base video sequence in order to view amore detailed presentation in an alternate video sequence. Furtherdetail can be provided by linking the alternate video sequence to yetanother video sequence, which in turn can link to another videosequence, and so on. This hierarchical presentation allows the viewer toselect and view detailed presentations of certain topics, such as topicsin which the viewer needs the most help, while skipping over or viewinghigh-level presentations of more familiar portions. Such video guidescan serve a wide audience by presenting a customized level of detail foreach viewer, and can save the viewer time by avoiding detailedpresentations of information already familiar to, or of little interestto, the user.

Home video editing is another application for detail-on-demand video.Home users can create video summaries of family activities or other homemovies. More detailed presentations of different activities can belinked to the base video sequence to provide additional footage ofinterest. For example, a family video Christmas card may contain a mainvideo sequence summarizing family activities for the year. Viewers canselect a link during each portion of the main video sequence to viewadditional video from the family activity of interest. For example, agrandparent may select additional video sequences of grandchildren,while other relatives may select addition details of a party or familyreunion.

Detail-on-demand video was designed to support the authoring and use ofinteractive video in a wide variety of applications. Characteristics ofvideo representations meeting this design goal include a hierarchicalstructure where video clips are combined into composites, as well aslinks between elements in this hierarchy.

FIG. 1 shows a diagram of an exemplary detail-on-demand summary asdescribed in U.S. patent application Ser. No. 10/116,026, including twohierarchically organized video segments 100, 110 and three links 116,118, 120 between those video segments. The first link 116 is from“composite 3” 104 to “composite 6” 110, the second link 118 from “clip5” 122 to “composite 8” 114 and the third link 120 from “clip 11” 126 to“clip 7” 124. If more than one link can be active at a particular time,which can happen if links are specified for multiple levels of thehierarchy, the lowest-level link can be set to have precedence.

While detail-on-demand videos can provide an interactive summary foraccess into longer linear videos, human authoring of such summaries isvery time consuming and not cost effective if the summary will only beused a few times. While the editing of video typically involves theselection and sequencing of video clips into a linear presentation,which in itself can be a lengthy process, authoring detail-on-demandvideo is more complicated as it involves the authoring and interlinkingof one or more such linear video presentations.

In many such presentations, individual video clips must be selected andgrouped into video composites as higher-level building blocks. Videoclips and/or composites must be selected to be the source or destinationanchor for each navigational link used to link the building blocks ofrelated material. Source anchors must be selected that can specify thestarting point at which a link becomes active, as well as the length oftime for which the link is active. Destination anchors must be selectedthat can specify the starting point and length of the video played as aresult of a viewer traversing the active link. Unlike hyperlinks in Webpages or in most hypervideo systems, the link destination is not just astarting point but an interval of content. The person creating thesummary must also determine where playback will continue upon completionof the video viewed using the link or when the viewer aborts the playingof that video.

The length of time necessary for an individual to create such adetail-on-demand summary can be undesirable in many situations, such asthe summarizing of home movies for consumer applications. It would bepreferable in many situations to present a way to shift most, if notall, of the time and effort necessary to create such hypervideosummaries away from the end users.

DESCRIPTION OF THE FIGURES

FIG. 1 is a diagram showing a multi-level summary of the prior art.

FIG. 2 is a diagram showing the segmenting of a linear video into clips,in accordance with one embodiment of the present invention.

FIG. 3 is a diagram showing the selection of clips from the diagram ofFIG. 2.

FIG. 4 is a diagram showing a linked multi-level summary using the clipselection of FIG. 3.

FIG. 5 is a diagram showing another automatically generated interactivevideo including three summary levels and the source video.

FIG. 6 is a diagram showing a multi-level summary using the channelmetaphor in accordance with one embodiment of the present invention.

FIG. 7 is a flowchart showing a process for automatically generatinghypervideo summaries in accordance with one embodiment of the presentinvention.

FIG. 8 is a flowchart showing a process for automatically generatingmulti-channel summaries in accordance with one embodiment of the presentinvention.

FIG. 9 is a flowchart showing another process for automaticallygenerating multi-channel summaries in accordance with one embodiment ofthe present invention.

DETAILED DESCRIPTION

Systems and methods in accordance with embodiments of the presentinvention can overcome deficiencies in existing video summarizationapproaches by automatically generating hypervideo summaries comprised ofmultiple levels of related content. Such a summary can be generated byautomatically selecting short clips from the original video, such asthrough an authoring and playing interface for “detail-on-demand” video.Such a process can generate summaries at different levels of detail,group clips into composites, and place links between composites atdifferent summary levels. Clips can be selected based on properties or“goodness” criteria such as technical suitability, which can bedetermined automatically from factors such as camera motion, and ontemporal location in the source video. Certain embodiments can alsoallow the resulting hypervideo to be edited in the workspace.

Detail-on-demand video summaries differ from other hierarchical videosummaries in that users can request additional detail while playing thevideo rather than having to use a separate interface consisting ofkeyframes or a tree view. While each level of a detail-on-demand summarycan be similar to a linear video summary, a significant difference canbe that users are able to request additional detail for parts of thevideo rather than being restricted to a predetermined level of detail.

Each level of a generated, interactive summary can be of a differentlength, with the top level being a rapid overview of the content and thelowest level containing the entire source video. The generation of themulti-level video summary can include at least three basic decisions,including for example: (1) how many levels to generate (and, possibly,the length of each level), (2) which clips from the source video to showin each summary, and (3) which links to generate between the levels ofthe summary.

An exemplary approach to automatically generating such a hypervideosummary is shown in FIG. 7. Such an approach can be utilized with anyappropriate device, such as may include a desktop PC, video workstation,digital video camera, or home electronic device, and can be implementedthrough hardware or software, or a combination of hardware and software.A linear video, such as a home movie or production video, can beautomatically divided into clips and takes using any appropriatesegmenting criteria or determination mechanism 500, such as thosedescribed herein. Clips from the video can be automatically selectedusing predetermined criteria 502. The number of levels to be included ina summary can be determined automatically 504, as well as the length ofeach level to be generated 506. Clips to be utilized for each level canbe automatically selected and grouped into composites where appropriate508. Links can then be automatically generated or provided betweencomposites and/or clips of different levels having related material 510.

The number of levels to be included in such an interactive summary canbe dependent upon any appropriate characteristic of the video, such asthe length of the source video. For example, a single 30-second videosummary might be generated for videos that are under five minutes inlength. For a video between 5 and 20 minutes in length, two summariescan be generated: one summary being 30 seconds in length and the secondbeing 3 minutes in length. For videos over 20 minutes in length, threesummaries can be generated: one summary that is 30 seconds long, onesummary that is three minutes long, and the last summary being onequarter the length of the total video up to a maximum of 10 minutes. Thenumber of summaries and length of each summary can vary, and theoriginal video lengths at which the generated summaries change can vary.The lengths and numbers of summaries can be hard coded into the system,placed into an options display for selection by a user, or completelydependent upon the choice of the user. Where the choices are not hardcoded, the choices can be selected by any appropriate means, such as byselecting from a list or entering values into a text area.

An exemplary algorithm can segment a linear video or video file intovideo segments, such as “takes” and “clips.” Takes and clips can bedefined in any of a number of appropriate ways, using any of a number ofsegmenting criteria, that would be understood to one of ordinary skillin the art. For example, when segmenting an un-produced or “home” video,takes can be defined by the turning on and/or turning off of the camerathat is recording the video. Clips can be defined as sub-segments ofthese takes generated by analyzing the video and determining, forexample, good quality segments. Here, good quality can be defined by asmooth camera motion, or lack of camera motion, as well as good lightinglevels or other measures of video quality. For produced or other typesof video, takes can be defined as scenes, and clips can be the shots ofthe video. Scenes and clips can be identified using any of a number ofexisting techniques. An exemplary algorithm can alternatively assumethat the video has first been segmented into “takes” and “clips.”

An exemplary algorithm can select clips to use for each summary levelusing a selection process that may be closely related to traditionalvideo summarization. For example, an algorithm can select clips based onthe distribution of the clips in the video. Such an algorithm can begeared toward un-produced video, where clips can have been selected bytheir video quality. An alternative algorithm can assume that anexternal goodness measure has been computed for the clips or shots. Suchan algorithm can be more suitable for produced video, such asprofessional training videos, wherein the clips and scenes can be welldefined.

In developing an algorithm such as those described above for un-producedvideo, one approach attempts to identify an array of a number (m) ofhigh-quality video clips through an analysis of video properties such ascamera motion and lighting. An average clip length (C) can becalculated, pre-determined, or selected, such as by a user or systemdeveloper, so that the number (n) of clips needed for a summary is thelength of the summary (S) in seconds divided by the average clip length,or n=S/C. So, for a three-minute video summary, and an average cliplength of 3.5 seconds, using this algorithm would suggest selectingapproximately 51 clips for the summary.

In some embodiments, it can be guaranteed that the first and last clipare contained in each summary, with the remainder of the clips beingdistributed, evenly or otherwise, in the array of potential clips. Ifeven distribution is selected, such an algorithm can select one clipevery (m−1)/(n−1) potential clips. FIG. 2 shows an exemplary linearvideo summary 200 composed of 15 high-value clips that wereautomatically identified in a four-take source video. These clips canrepresent the entire original video, or a subset of the entire video.The use of an estimate of average clip length can generate summaries ofapproximately the desired length, rather than exactly the requestedlength. Such an algorithm can be easily altered to support applicationsrequiring summaries of exact lengths, such as by modifying in/out pointsin the selected clips rather than accepting the in/out points determinedby video analysis.

An alternative algorithm can use the same segmentation of the video oflength (L) into takes and clips. For the first level, a length L₁ can beset, such as 30 seconds, and a clip length C₁ can be set, such as 3seconds, to pick n=(L₁/C₁) clips. The centers of intervals of length L/ncan be checked, and a clip can be included from each of the takes atthose positions. This can be seen, for example, at the bottom of FIG. 3for timelines 202, 204 with the centers of 3 and 6 intervals,respectively. The clip closest to the interval center can be selected.If more than one interval center hits the same take, the clip closest tothe center of the take can be selected. If fewer than n clips areselected, an algorithm or system can look for takes that have not beenused, such as because those clips were too short to be hit. One clip canbe selected from each of those takes, starting with the clip that isfurthest away from the already-picked clips, until n clips are picked orthere are no more takes that have not been used. If still fewer than nclips are picked, an additional clip can be picked from each take indescending order of the number of clips in a take, or in descendingorder of take duration, until enough clips are picked, such as forexample the clips shown as selected for Level 2 in FIG. 3 which couldhave been picked using this approach. Picking three and more clips pertake can continue if picking two clips per take is insufficient. Asimilar approach can be used for the second level with lengths L₂, suchas 180 seconds, and clip length C₂, such as 5 seconds. FIG. 3 shows anexample of how such an algorithm can select clips from the same sourcevideo 200 as shown in FIG. 2. Since the takes and clips are ofrelatively even lengths, both algorithms can produce similar results.Different application requirements can, however, make one algorithm moresuitable.

Both of the exemplary algorithms described above can provide glimpsesinto a source video at somewhat regular intervals, with the un-producedvideo algorithm using the number of clips or shots as a measure ofdistance and the produced video algorithm using playing time as ameasure of distance. Another algorithm could, for example, use a“goodness” value for clips and select the highest value clips first.Such an algorithm could guarantee that each level of the summary wouldbe a superset of the higher (shorter) levels of the summary. Such analgorithm can be of greater value for edited content, such as in casesof training video, where more general content may be preferred to videoon more specialized topics.

Once a multi-level summary has been generated, links can be generatedbetween the levels. Links can be used to take a user or viewer from aclip at one level to the corresponding location(s) in the next lowerlevel. Viewers can navigate from clips of interest to additional contentfrom the same, or approximately the same, period. FIG. 4 shows anexample of a summary 300 having two levels 302, 304 created from thefifteen high-value clips identified from the four-take source video ofFIGS. 2 and 3.

Generating links can include a number of decisions. A link in oneembodiment can be a combination of a source anchor, a destinationanchor, a label, and return behavior(s) for both completed and abortedplayback of a destination. For example, link generation can be based ontakes or scenes. All clips from a particular take can be grouped into acomposite that will be a source anchor for a link to the next level. Acomposite in a higher-level summary can be linked to the sequence ofclips from the same take in the next level. If a take is not representedin a higher level, that take can be included in the destination anchorfor the link from the previous take. For example, the link 308 from themiddle clip in the top level of the summary shown in FIG. 4 has Clip 8in Level 1 as the source anchor of the link. The destination anchor 310is a composite composed of Clip 7 and Clip 9. Clip 9, which is from Take3, has been included because there was no clip from Take 3 in Level 1.

Link labels can be used to provide information about the number of clipsand length of the destination anchor. Algorithms that generate textualdescriptions for video based on metadata, including transcripts forexample, can be used to produce labels with more semantic meaning. Linkreturn behaviors for both completed and interrupted destination playbackcan default to returning to the point of original link traversal.Returning to the end of the source anchor, rather than the point of linktraversal, at destination completion can provide a more efficientsummary. Having both links return to the beginning of the source anchorat destination completion can provide the greatest context for theperson viewing the summary.

While algorithms such as those described above can be used toautomatically generate multi-level summaries with navigational linksbetween the levels of summary to support video browsing, authors can beprovided with the ability to refine the automatically generatedinteractive summary, such as in cases where the interactive summary maybe used many times. An example of such a case is an index to a trainingvideo. A graphical layout for editing a hypervideo summary can beautomatically generated in the workspace. Each layer of the summary canbe presented in the layout as a horizontal list of clips and/orcomposites. Links can be represented in the workspace through the normallink visualization of arrows into and out of the keyframes and compositevisualizations. FIG. 5 shows part of an exemplary four-level summarygenerated by an un-produced video algorithm, such as described above,for a one-hour, 33-take martial arts video.

Correlated Media Channels

While automatically-generated hypertext summaries such as thosedescribed above can provide significant advantages over existing videosummaries, a problem that may remain for certain users is thathypermedia systems consisting of linked audio and/or video have provendifficult for people to navigate. The classic problems associated withnavigating hypertext, namely spatial disorientation and cognitiveoverhead, are exacerbated in the case of hypermedia navigation. Spatialdisorientation is typically caused by unfamiliar and/or complex linkstructures, leading to confusion as to the location of a user or towhere the user should go from that location. Cognitive overhead consistsof keeping track of link structure and link navigation history. Examplesof cognitive overhead occurring in typical user tasks are reflected by auser being confused as to whether an item is a link and, if so, whetherthe user should take the link, has already taken the link, might havemissed a link, or might not be able to return if taking the link. Inaddition to tracking link structure, navigation history, and decidingwhether and when to follow links, users of typical hypermedia systemsmust simultaneously be attentive to the changing media content, whichincurs its own cognitive overhead.

The problems of spatial disorientation and cognitive overhead arecompounded with linked time-based media, such as audio and video, whichhave content that can change over time. Adding hyperlinks to video canadd an additional cognitive load, resulting in an increased likelihoodof user confusion. While multi-level hyper-video summaries can allowpeople to view a video summary and, at any point, follow a link toaccess additional related details, users can still get lost trying tonavigate the links.

Systems and methods in accordance with embodiments of the presentinvention can avoid such link navigation problems by building on theobservation that people often do not actually want links to relatedcontent, but desire control over multiple views of related content orcontrol over the amount of detail displayed about that content. A userinterface metaphor for video summaries can be shifted from one thatemphasizes links and link structure to one that completely eliminateslinks in the user interface. Such an approach can allow users to “changechannels” instead of “navigating” between streams. By hiding links fromthe user, using a channel-based metaphor, the entire user experiencechanges from one of navigating along links to one of switching betweenrelated representations of related content. By replacing the explicitlinks in a hypermedia system user interface with implicit andalgorithmically-generated links, certain problematic steps and cognitiveprocesses can be eliminated that are otherwise associated withhypermedia navigation. These steps and processes can include, forexample: detecting when links are available, explicitly following links,remembering which links have been followed, explicitly returning from alink, recognizing when the system implicitly returns from a link (e.g.when finished playing a sequence associated with a followed link), andmaintaining a sense of context or location within a link structure.Instead of requiring a user to overcome the cognitive hurdles associatedwith “navigation,” users are free to focus on controlling differentviews of multimedia content. Such an approach can also simplifyhypermedia authoring and maintenance by replacing the need for definingand maintaining explicit links and link behaviors with a two-stepprocess of defining correlated media streams and defining an algorithmfor dynamically determining link behavior.

Such an authoring process can be simpler than the typical hypermediaauthoring process in several respects. Once correlated media streams aredefined, for example, those streams can be edited arbitrarily withoutbreaking any explicit links. Clips can be repositioned, and clips orentire channels can be added or removed without the need for maintainingexisting links. In addition, once algorithms for determining linkbehavior are defined, those algorithms can be re-used for many differentsets of correlated media streams. Because the algorithms can be re-used,the process of defining a link behavior algorithm in some embodimentscan be reduced to selecting an algorithm from a core set of pre-definedalgorithms. In many cases, such as the example of multi-levelhyper-video summaries, the authoring of correlated media streams can beentirely automated.

In one embodiment, links are hidden during the multimedia streamauthoring process. One such authoring process consists of at least twomajor steps, the first of which can include the creation, definition, orassembly of the media streams, which may be linked temporally orsemantically (e.g. multi-level video summaries). The second such stepcan include a mapping of media streams to channels, including analgorithm for automatically determining link behavior based on thestream correlation and the time at which a channel change is requested.

An example of such a method is shown in FIG. 8. A linear video can beautomatically divided into clips and takes 600. Clips to be used in thechannel-based summary can be automatically selected from the video usingpredetermined criteria 602. The number of levels to be generated can beautomatically determined, as well as the length of each level 604. Clipscan be automatically selected for each level, and can be grouped intocomposites where appropriate 606. Each level of the summary can beautomatically mapped to a respective channel 608, and implicit links canbe automatically generated between the channels 610.

Another example is shown in FIG. 9. In this example, media streamshaving related content can be automatically generated from a video 700.The generated media streams can be mapped to respective channels 702,such that a user can switch between channels to view related content,such as further examples, additional scenes, or more detailedinformation. An algorithm can be automatically selected, generated, ordefined in order to dynamically determine the links and link behaviorbetween the channels 704. These implicit links can be automaticallygenerated between the channels using the algorithm 706, such that a usercan switch between channels in an appropriate display device, such as avideo player on a personal computer or home video equipment.

Certain embodiments can be designed under the assumption that users willexperience multimedia content within the context of a client “player”application that will allow the users to pause and “rewind” the currentmedia sequence, as well as follow links to associated media sequences. Asimplification can also be imposed at any time, in any stream, thatthere is at most one link to another stream. Such simplification can beuseful from an authoring perspective as well as a user perspective, asboth authors and users can have fewer links to manage and maintain. Inpractice, however, users may have multimedia players that provide thecapability to follow a link and return to that point in the video clipthat was playing when the user selected the link. In some cases this“interrupt and return” capability can allow users to take two links atonce, one link back and one link forward to a new linked clip. In fact,if multimedia players provide the capability to “interrupt and return”back up multiple levels of links, users will effectively have multiplelinks that can be taken. While users may be able to take more than onelink back to clips the users had been viewing, the users can be limitedto taking at most one link forward.

Embodiments in accordance with the present invention can also takeadvantage of another simplification referred to herein as a video“composite.” When a video composite is used to group clips at the sourceof a link, the link may be taken at any point in any clip in thecomposite. When a composite is used to group clips at the destination ofa link, the composite can be treated as a single clip. Composites can beused to make sure that it is possible to follow one link “forward” to ahigher numbered channel at any point in any stream, unless the user isalready on the highest-numbered channel, and that it is possible tofollow at least one link “back” to a lower numbered channel, unless theuser is already on the lowest-numbered channel.

In an exemplary application of a simplification approach, the user of ahypermedia system having multi-level video summaries can be faced withthe task of understanding the content of a particular video sequence.The user can accomplish this task by watching any of the media streamsand “changing channels” at any time to receive more or less detail.Because all explicit links between the correlated media streams arehidden, the user can experience none of the cognitive overhead orspatial disorientation normally associated with link-based navigation.

Before a user can experience video summaries as correlated mediachannels, the channels must first be defined. Automatic techniques forgenerating multi-level video summaries can be used to determine thesequences of video clips that comprise the video summaries, such asthose described above. The video summaries can be mapped to channels,such as by mapping each level to a different channel, such that thefirst channel corresponds to the briefest summary, or highest level, andsuccessive channels map to successively more detailed summaries, orlower-level summaries.

For instance, channels can be determined by the multi-level videosummary depicted in FIG. 4. Here the top-level, briefest summary 302 canbe mapped to channel 1, an intermediate-level summary 304 can be mappedto channel 2, and the most-detailed summary, or the entire video 306,can be mapped to channel 3. Composites are used in channels 2 and 3 inthe Figure to group clips for both the source and destination of links.

Exemplary Algorithms

A number of algorithms can be used to define link behavior in linksummaries. Such algorithms can determine properties such as thesequence, file, and offset to load into a player when a user changes achannel. An algorithm can use information about the video summaries,such as the clip sequence, the media file, the offset where each clip isstored, the length of each clip, the composites that make up eachsummary, and the associations between composites. This information canbe readily available from the summary representation, as thisinformation can be used by a digital video player to play a summarysequence.

An exemplary link behavior algorithm that can be used in accordance theabove approach is given by the following: if (following or returningfrom a link) {   if (source clip exists in the destination sequence)    stay at the current position in the current     clip, but switch tothe new channel's play     sequence;   else if (changing back to a lessdetailed summary &&     the source composite has played more than T%    for some threshold T)       jump to the end of the associated      composite;   else     jump to an offset in the destinationcomposite     proportional to the amount of time the source    composite has played; }

Such an algorithm can be integrated with a digital video player tocompute the link behavior as a user changes channels. If playback isrequired on an unmodified player, channel change behavior can bepre-computed for each pair of associated composites and the logic can bestored within a multimedia file format, such as MPEG-4. Such an approachcan, however, imply certain restrictions on the algorithms such aseliminating the possibility of a proportional jump. If it is desired tojump to the beginning of a composite when following or returning from alink, an algorithm can include a step such as the following: if(following or returning from a link)   jump to the beginning of theassociated composite;

In the case of multi-level video summaries, certain embodiments canrequire that successively less-detailed summaries be proper subsets ofeach other. Such a requirement can guarantee that each possible sourceclip will exist in the destination sequence when changing channels to amore detailed summary. Such a requirement can smoothly preserve temporalcontinuity, which can result in a more satisfying user experience whenchanging channels.

An example algorithm for correlated media channels can be demonstratedusing a training hyper-video with several demonstrations of a process ortechnique that a user is trying to learn, such as a martial arts kick.The correlated media channels can be arranged as in FIG. 6, withdemonstrations of each move as performed by the master instructor on themain or “master” channel 400, demonstrations of the same moves performedby the best students or black belts on the next channel 402, anddemonstrations of the same moves performed by students of successivelyless ability on successively higher-numbered channels 404, 406. Unlikethe multi-level summary example in which successive channels containedan expanded, more detailed version of the same content, here successivechannels can contain additional examples of the same content. Whilelearning a particular move, such as Kick 1, a user can switch to ahigher-numbered channel to view more examples of Kick 1. If the videosare examples of the user's actual class, a particular channel might showthe user performing Kick 1, such that the user can choose to simplywatch that channel for each of the moves, or can switch between theinstructor and the user to determine or evaluate technique.

As in the multi-level hypertext summary example, the authoring task canconsist of defining the media streams and their correlations. Eachstream can be ordered by category (Kick 1, Kick 2, punches, etc.), whilesuccessive streams can store additional examples that may be of lowerquality or relevance. In this case a more simplistic algorithm can beused to determine link behavior. Because the channels do not store thesame clips, the link behavior can simply switch to the beginning of theassociated composite each time a channel is changed. If the length andarrangement of the clips are substantially similar for each channel, thelink behavior can alternatively switch to approximately the same pointin the other channel.

Audio

Systems and methods in accordance with embodiments of the presentinvention also have applications in other media, such as audio-onlyhypermedia. For example, a version of the popular audio “books on tape”for digital media can provide a channel-changing interface for listeningto linked audio summaries providing different amounts of detail. Audiosummaries can be created and organized using methods similar to thosediscussed above with respect to multi-level video summaries, usingsimilar algorithms to those discussed above. Users can locate particularpositions in the audio content by first listening to lower-numberedchannels, or summaries with less detail, to locate particular chapters.A user can then switch to higher-numbered channels, or summaries withmore detail, to locate particular sections. Once the section of interesthas been located, the user can switch to the highest-numbered channelfor the unabridged content. Certain embodiments can also generatesummaries using a combination of the above-described criteria for bothaudio and video, such as for an outdoor concert video where the amountof lighting or “goodness” parameters might not change substantially butthe sound level will change dramatically between songs.

Perception of Continuity

One interesting question concerning the channel metaphor described aboveis how a user will perceive the continuity of media on channels that arenot currently being displayed to the user. When changing betweenchannels on a television set, for example, a user knows that the mediaon other channels will continue to broadcast whether or not the user iswatching those channels. If the user changes from channel 4 at time T1,and changes back to channel 4 at time T2, the portion of the media thatwas broadcast between time T1 and T2 is essentially lost to the user(disregarding rebroadcast, recording, etc.).

In the case of digital media, and particularly digital media in anon-broadcast context, the perception that media-related informationwould be lost is less of a concern. Users are accustomed to digitalmedia being available at any time, such that a user can always go backand locate the media for viewing. The perception of media continuity canstill be at least somewhat determined by the link return behavior. Forexample, an algorithm such as those described above can be selected thatwill return to the end of the “calling” sequence when a user completeswatching a lower-level summary of that sequence on another channel. Thealgorithm can also use a threshold T, which may be any appropriate valuesuch as about 25%, about 50%, or about 75%, such that if the userreturns after watching a percentage of a lower-level sequence at leastas great as a threshold T percentage of that sequence, the user can bedirected to the end of the higher-level sequence. The perception of theuser may then be that changing channels to a more-detailed summary doesnot stop the less-detailed summary channel from playing. This behaviormay be expected by a user with content such as multi-level summaries,which are, in general, proper subsets of each other, as there would belittle point in viewing a less-detailed summary after viewing theassociated, more-detailed summary of the same content. If a user watcheshalf of the more detailed content and loses interest, the user may alsosimply wish to move on to the next sequence, instead of viewing the restof the higher-level summary that is no longer of interest.

In contrast, in a situation such as that shown in FIG. 5 can exist wheresuccessive channels are not subsets of each other. In this case, it maynot be desirable for an algorithm to define the link return behavior toskip to the end of calling sequence. Using an algorithm that alwaysjumps to the beginning of a linked composite can be overly simplistic,and can cause viewers to see some clips more than once in some cases,but can provide the perception that changing channels to a more-detailedsummary effectively stops the less-detailed summary channel fromplaying.

The foregoing description of preferred embodiments of the presentinvention has been provided for the purposes of illustration anddescription. It is not intended to be exhaustive or to limit theinvention to the precise forms disclosed. Many modifications andvariations will be apparent to one of ordinary skill in the relevantarts. The embodiments were chosen and described in order to best explainthe principles of the invention and its practical application, therebyenabling others skilled in the art to understand the invention forvarious embodiments and with various modifications that are suited tothe particular use contemplated. It is intended that the scope of theinvention be defined by the claims and their equivalence.

1. A method for automatically generating a multi-level video summary,comprising: automatically dividing a video file into video segmentsusing segmenting criteria; automatically generating at least one summarylevel including video segments from the video file, the video segmentsin each summary level selected using selection criteria; andautomatically generating navigational links between video segments inthe summary levels, the navigational links connecting video segmentscontaining related material.
 2. A method according to claim 1, furthercomprising: automatically determining the length of each summary level.3. A method according to claim 1, further comprising: automaticallygrouping video segments in a summary level into a video composite, thevideo composite including at least two video segments in the summarylevel.
 4. A method according to claim 1, further comprising: providing auser interface whereby a user can view the multi-level video summary,the user interface allowing the user to navigate between summary levelsusing the navigational links.
 5. A method according to claim 1, wherein:automatically generating at least two summary levels further includesgenerating summary levels each having a different level of detail forrelated video segments.
 6. A method according to claim 1, furthercomprising: automatically determining the number of summary levels togenerate.
 7. A method according to claim 1, further comprising:automatically determining which navigational links to generate.
 8. Amethod according to claim 1, further comprising: providing at least onealgorithm to be used in generating a multi-level video summary.
 9. Amethod according to claim 1, wherein: the selection criteria includescriteria selected from the group consisting of goodness, smoothness ofcamera operation, amount of camera motion, location in the video, andlighting level.
 10. A method according to claim 1, further comprising:providing the ability for an author to refine an automatically-generatedmulti-level video summary.
 11. A method according to claim 1, furthercomprising: including the first and last video segment from the videofile in the summary levels.
 12. A method according to claim 1, furthercomprising: ensuring that the selection of video segments includes videosegments distributed throughout the video file.
 13. A method accordingto claim 1, wherein: each navigational link includes a source anchor inone summary level, a destination anchor in another summary level, and atleast one return behavior.
 14. A method according to claim 13, wherein:each navigational link further includes a label.
 15. A method accordingto claim 13, further comprising: automatically grouping some of thevideo segments in a summary level into a video composite that will be asource anchor for a link to another summary level.
 16. A methodaccording to claim 1, wherein: the video segments in each summary levelare in chronological order as the video segments appear in the videofile.
 17. A method according to claim 1, wherein: each summary levelincludes a different number of video segments.
 18. A method according toclaim 13, wherein: the return behavior includes a return positionselected from the group consisting of the beginning of a video segment,the point in a video segment at which a navigational link is followed,and the end of a video segment.
 19. A system for automaticallygenerating a multi-level video summary, comprising: means forautomatically dividing a video file into video segments using segmentingcriteria; means for automatically generating at least one summary levelincluding video segments from the video file, the video segments in eachsummary level selected using selection criteria; and means forautomatically generating navigational links between video segments inthe summary levels, the navigations links connecting video segmentscontaining related material.
 20. A computer program product forexecution by a processor for automatically generating a multi-levelvideo summary, comprising: computer code for automatically dividing avideo file into video segments using segmenting criteria; computer codefor automatically generating at least one summary level including videosegments from the video file, the video segments in each summary levelselected using selection criteria; and computer code for automaticallygenerating navigational links between video segments in the summarylevels, the navigations links connecting video segments containingrelated material.